3,447 research outputs found

    Model-independent rate control for intra-coding based on piecewise linear approximations

    Get PDF
    This paper proposes a rate control (RC) algorithm for intra-coded sequences (I-frames) within the context of block-based predictive transform coding that departs from using trained models to approximate the rate-distortion (R-D) characteristics of the video sequence. Our algorithm employs piecewise linear approximations of the rate-distortion (R-D) curve of a frame at the block-level. Specifically, it employs information about the rate and distortion of already compressed blocks within the current frame to linearly approximate the slope of the R-D curve of each block. The proposed algorithm is implemented in the High-Efficiency Video Coding (H.265/HEVC) standard and compared with its current RC algorithm, which is based on a trained model. Evaluations on a variety of intra-coded sequences show that the proposed RC algorithm not only attains the overall target bit rate more accurately than the RC algorithm used by H.265/HEVC algorithm but is also capable of encoding each I-frame at a more constant bit rate according to the overall bit budget

    Rate control for HEVC intra-coding based on piecewise linear approximations

    Get PDF
    This paper proposes a rate control (RC) algorithm for intra-coded sequences (I-frames) within the context of block-based predictive transform coding (PTC) that employs piecewise linear approximations of the rate-distortion (RD) curve of each frame. Specifically, it employs information about the rate (R) and distortion (D) of already compressed blocks within the current frame to linearly approximate the slope of the corresponding RD curve. The proposed algorithm is implemented in the High-Efficiency Video Coding (HEVC) standard and compared with the current HEVC RC algorithm, which is based on a trained rate lambda (R-λ) model. Evaluations on a variety of intra-coded sequences show that the proposed RC algorithm not only attains the overall target bit rate more accurately than the current RC algorithm but is also capable of encoding each I-frame at a more constant bit rate according to the overall bit budget, thus avoiding high bit rate fluctuations across the sequence

    Learning optimised representations for view-invariant gait recognition

    Get PDF
    Gait recognition can be performed without subject cooperation under harsh conditions, thus it is an important tool in forensic gait analysis, security control, and other commercial applications. One critical issue that prevents gait recognition systems from being widely accepted is the performance drop when the camera viewpoint varies between the registered templates and the query data. In this paper, we explore the potential of combining feature optimisers and representations learned by convolutional neural networks (CNN) to achieve efficient view-invariant gait recognition. The experimental results indicate that CNN learns highly discriminative representations across moderate view variations, and these representations can be further improved using view-invariant feature selectors, achieving a high matching accuracy across views

    Multi-camera trajectory forecasting : pedestrian trajectory prediction in a network of cameras

    Get PDF
    We introduce the task of multi-camera trajectory forecasting (MCTF), where the future trajectory of an object is predicted in a network of cameras. Prior works consider forecasting trajectories in a single camera view. Our work is the first to consider the challenging scenario of forecasting across multiple non-overlapping camera views. This has wide applicability in tasks such as re-identification and multi-target multi-camera tracking. To facilitate research in this new area, we release the Warwick-NTU Multi-camera Forecasting Database (WNMF), a unique dataset of multi-camera pedestrian trajectories from a network of 15 synchronized cameras. To accurately label this large dataset (600 hours of video footage), we also develop a semi-automated annotation method. An effective MCTF model should proactively anticipate where and when a person will re-appear in the camera network. In this paper, we consider the task of predicting the next camera a pedestrian will re-appear after leaving the view of another camera, and present several baseline approaches for this. The labeled database is available online https://github.com/olly-styles/Multi-Camera-Trajectory-Forecastin

    Scene-based imperceptible-visible watermarking for HDR video content

    Get PDF
    This paper presents the High Dynamic Range - Imperceptible Visible Watermarking for HDR video content (HDR-IVW-V) based on scene detection for robust copyright protection of HDR videos using a visually imperceptible watermarking methodology. HDR-IVW-V employs scene detection to reduce both computational complexity and undesired visual attention to watermarked regions. Visual imperceptibility is achieved by finding the region of a frame with the highest hiding capacities on which the Human Visual System (HVS) cannot recognize the embedded watermark. The embedded watermark remains visually imperceptible as long as the normal color calibration parameters are held. HDR-IVW-V is evaluated on PQ-encoded HDR video content successfully attaining visual imperceptibility, robustness to tone mapping operations and image quality preservation

    Active contours with weighted external forces for medical image segmentation

    Get PDF
    Parametric active contours have been widely used for image segmentation. However, high noise levels and weak edges are the most acute issues that hinder their performance, particularly in medical images. In order to overcome these issues, we propose an external force that weights the gradient vector flow (GVF) field and balloon forces according to local image features. We also propose a mechanism to automatically terminate the contour's deformation. % process. %Our approach improves performance over noisy images and weak edges and allows snake's initialization using a limited number of manually selected points. Evaluation results on real MRI and CT slices show that the proposed approach attains higher segmentation accuracy than snakes using traditional external forces, while allowing initialization using a limited number of selected points

    Graph-based transforms based on prediction inaccuracy modeling for pathology image coding

    Get PDF
    Digital pathology images are multi giga-pixel color images that usually require large amounts of bandwidth to be transmitted and stored. Lossy compression using intra - prediction offers an attractive solution to reduce the storage and transmission requirements of these images. In this paper, we evaluate the performance of the Graph - based Transform (GBT) within the context of block - based predictive transform coding. To this end, we introduce a novel framework that eliminates the need to signal graph information to the decoder to recover the coefficients. This is accomplished by computing the GBT using predicted residual blocks, which are predicted by a modeling approach that employs only the reference samples and information about the prediction mode. Evaluation results on several pathology images, in terms of the energy preserved and MSE when a small percentage of the largest coefficients are used for reconstruction, show that the GBT can outperform the DST and DCT

    Frequency-dependent perceptual quantisation for visually lossless compression applications

    Get PDF
    The default quantisation algorithms in the state-of-the-art High Efficiency Video Coding (HEVC) standard, namely Uniform Reconstruction Quantisation (URQ) and Rate-Distortion Optimised Quantisation (RDOQ), do not take into account the perceptual relevance of individual transform coefficients. In this paper, a Frequency-Dependent Perceptual Quantisation (FDPQ) technique for HEVC is proposed. FDPQ exploits the well-established Modulation Transfer Function (MTF) characteristics of the linear transformation basis functions by taking into account the Euclidean distance of an AC transform coefficient from the DC coefficient. As such, in luma and chroma Cb and Cr Transform Blocks (TBs), FDPQ quantises more coarsely the least perceptually relevant transform coefficients (i.e., the high frequency AC coefficients). Conversely, FDPQ preserves the integrity of the DC coefficient and the very low frequency AC coefficients. Compared with RDOQ, which is the most widely used transform coefficient-level quantisation technique in video coding, FDPQ successfully achieves bitrate reductions of up to 41%. Furthermore, the subjective evaluations confirm that the FDPQ-coded video data is perceptually indistinguishable (i.e., visually lossless) from the raw video data for a given Quantisation Parameter (QP)

    Spatiotemporal adaptive quantization for the perceptual video coding of RGB 4:4:4 data

    Get PDF
    Due to the spectral sensitivity phenomenon of the Human Visual System (HVS), the color channels of raw RGB 4:4:4 sequences contain significant psychovisual redundancies; these redundancies can be perceptually quantized. The default quantization systems in the HEVC standard are known as Uniform Reconstruction Quantization (URQ) and Rate Distortion Optimized Quantization (RDOQ); URQ and RDOQ are not perceptually optimized for the coding of RGB 4:4:4 video data. In this paper, we propose a novel spatiotemporal perceptual quantization technique named SPAQ. With application for RGB 4:4:4 video data, SPAQ exploits HVS spectral sensitivity-related color masking in addition to spatial masking and temporal masking; SPAQ operates at the Coding Block (CB) level and the Prediction Unit (PU) level. The proposed technique perceptually adjusts the Quantization Step Size (QStep) at the CB level if high variance spatial data in G, B and R CBs is detected and also if high motion vector magnitudes in PUs are detected. Compared with anchor 1 (HEVC HM 16.17 RExt), SPAQ considerably reduces bitrates with a maximum reduction of approximately 80%. The Mean Opinion Score (MOS) in the subjective evaluations, in addition to the SSIM scores, show that SPAQ successfully achieves perceptually lossless compression compared with anchors

    JNCD-based perceptual compression of RGB 4:4:4 image data

    Get PDF
    In contemporary lossy image coding applications, a desired aim is to decrease, as much as possible, bits per pixel without inducing perceptually conspicuous distortions in RGB image data. In this paper, we propose a novel color-based perceptual compression technique, named RGB-PAQ. RGB-PAQ is based on CIELAB Just Noticeable Color Difference (JNCD) and Human Visual System (HVS) spectral sensitivity. We utilize CIELAB JNCD and HVS spectral sensitivity modeling to separately adjust quantization levels at the Coding Block (CB) level. In essence, our method is designed to capitalize on the inability of the HVS to perceptually differentiate photons in very similar wavelength bands. In terms of application, the proposed technique can be used with RGB (4:4:4) image data of various bit depths and spatial resolutions including, for example, true color and deep color images in HD and Ultra HD resolutions. In the evaluations, we compare RGB-PAQ with a set of anchor methods; namely, HEVC, JPEG, JPEG 2000 and Google WebP. Compared with HEVC HM RExt, RGB-PAQ achieves up to 77.8% bits reductions. The subjective evaluations confirm that the compression artifacts induced by RGB-PAQ proved to be either imperceptible (MOS = 5) or near-imperceptible (MOS = 4) in the vast majority of cases
    corecore